{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tutorial 10: Traffic Lights"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "This tutorial walks through how to add traffic lights to experiments. This tutorial will use the following files:\n",
    "\n",
    "* Experiment config for RL version of traffic lights in grid: `examples/exp_configs/rl/singleagent/singleagent_traffic_light_grid.py`\n",
    "* Experiment config for non-RL version of traffic lights in grid: `examples/exp_configs/non_rl/traffic_light_grid.py`\n",
    "* Network: `traffic_light_grid.py` (class TrafficLightGridScenario)\n",
    "* Environment for RL version of traffic lights in grid: (class TrafficLightGridEnv)\n",
    "* Environment for non-RL version of traffic lights in grid: (class AccelEnv)\n",
    "\n",
    "There are two main classes of traffic lights that Sumo supports: (1) actuated and (2) static traffic lights. This tutorial will cover both types. Moreover, in this tutorial, we'll discuss another type of traffic light. In total, we have 4 types of traffic lights in the Flow:\n",
    "\n",
    "1. Static Traffic Lights --> (Section 3)\n",
    "2. Actuated Traffic Lights --> (Section 4)\n",
    "3. Actuated Baseline Traffic Lights --> (Section 5)\n",
    "4. RL Traffic Lights --> (Section 6)\n",
    "\n",
    "Let's begin!\n",
    "\n",
    "First, import all necessary classes. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from flow.core.params import NetParams\n",
    "from flow.networks.traffic_light_grid import TrafficLightGridNetwork\n",
    "from flow.core.params import TrafficLightParams\n",
    "from flow.core.params import SumoParams, EnvParams, InitialConfig, NetParams, \\\n",
    "    InFlows, SumoCarFollowingParams\n",
    "from flow.core.params import VehicleParams\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. New parameters in `additional_net_params`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are a few unique additions to `additional_net_params` in the traffic light grid environments to be aware of. They are the following 2 items:\n",
    "\n",
    "#### grid_array\n",
    "`grid_array` passes information on the road network to the network, specifying the parameters you see below: `row_num`, `col_num`, `inner_length`, `short_length`, `long_length`, `cars_top`, `cars_bot`, `cars_left`, `cars_right`. This is required for any traffic light grid experiment.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### tl_logic\n",
    "`tl_logic` should be used for users who want to exert more control over individual traffic lights. `tl_logic` simply tells the env whether the traffic lights are controlled by RL or whether a default pattern or SUMO actuation is to be used. Use \"actuated\" if you want SUMO to control the traffic lights. \n",
    "\n",
    "For this tutorial, we will assume the following parameters for the `grid_array`, which specifies a traffic light grid network with 2 rows and 3 columns. `traffic_lights` should be set to `True` for every experiment in this tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "inner_length = 300\n",
    "long_length = 500\n",
    "short_length = 300\n",
    "n = 2 # rows\n",
    "m = 3 # columns\n",
    "num_cars_left = 20\n",
    "num_cars_right = 20\n",
    "num_cars_top = 20\n",
    "num_cars_bot = 20\n",
    "tot_cars = (num_cars_left + num_cars_right) * m \\\n",
    "    + (num_cars_top + num_cars_bot) * n\n",
    "\n",
    "grid_array = {\"short_length\": short_length, \"inner_length\": inner_length,\n",
    "              \"long_length\": long_length, \"row_num\": n, \"col_num\": m,\n",
    "              \"cars_left\": num_cars_left, \"cars_right\": num_cars_right,\n",
    "              \"cars_top\": num_cars_top, \"cars_bot\": num_cars_bot}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Defining Traffic Light Phases \n",
    "\n",
    "\n",
    "To start off, we define how SUMO represents traffic light phases. A phase is defined as the states that the traffic lights around an intersection can take. The phase of a typical four-way, traffic-light-controlled intersection is modeled by a string (of length 4, 8, or 12, etc., depending on the structure of the intersection). \n",
    "\n",
    "Consider the phase \"GrGr\". Every letter in this phase string (\"G\", \"r\", \"G\", \"r\") corresponds to a signal of an edge in the intersection, in clockwise order (starting from the northbound). Explicitly, the northern and southern edges of the intersection both have a state of \"G\" (green), where the eastern and western edges of the intersection both have a state of \"r\" (red). In this example, the intersection has 4 edges, each edge has one lane, and the only possible direction is going straight. \n",
    "\n",
    "\n",
    "Each character within a phase's state describes the state of one signal of the traffic light. Please note, that a single lane may contain several signals - for example one for vehicles turning left and one for vehicles which move straight (in this case, we may have something like \"GgrrGgrr\"). In other words, a signal does not control lanes, but links - each connecting a lane which is incoming into a junction to one which is outgoing from this junction.\n",
    "\n",
    "For more information about traffic light states, please refer to https://sumo.dlr.de/wiki/Simulation/Traffic_Lights#Signal_state_definitions\n",
    "\n",
    "\n",
    "NOTE: If the API is used at any point to modify the traffic light state, i.e. functions such as `setRedYellowGreenState`, this will override the traffic light's default phase.\n",
    "\n",
    "To do anything with traffic lights, you should interface with Flow's `TrafficLightParams` class"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once the `TrafficLightParams` class is instantiated, traffic lights can be added via the `add` function. One prerequisite of using this function is knowing the node id of any node you intend to manipulate. This information is baked into the experiment's network class, as well as the experiment's `nod.xml` file. For the experiment we are using with 2 rows and 3 columns, there are 6 nodes: \"center0\" to \"center5\". \n",
    "\n",
    "This will be the ordering of \"centers\" in our network:\n",
    "\n",
    "     | | |\n",
    "    -3-4-5-\n",
    "     | | |\n",
    "    -0-1-2-\n",
    "     | | |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tl_logic = TrafficLightParams()\n",
    "\n",
    "nodes = [\"center0\", \"center1\", \"center2\", \"center3\", \"center4\", \"center5\"]\n",
    "phases = [{\"duration\": \"31\", \"state\": \"GrGr\"},\n",
    "          {\"duration\": \"6\", \"state\": \"yryr\"},\n",
    "          {\"duration\": \"31\", \"state\": \"rGrG\"},\n",
    "          {\"duration\": \"6\", \"state\": \"ryry\"}]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this particular example, each of the 6 intersections corresponds to the same set of possible phases; in other words, at any time, all intersections will be at the same phase in this example. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for node_id in nodes:\n",
    "    tl_logic.add(node_id, tls_type=\"static\", programID=\"1\", offset=None, phases=phases)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can, however, customize a network in which each traffic light node has different phases. \n",
    "\n",
    "Following this step, the instance `tl_logic` of `TrafficLightParams` class should be passed into the network as element `traffic_lights`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "additional_net_params = {\"grid_array\": grid_array, \"speed_limit\": 35,\n",
    "                         \"horizontal_lanes\": 1, \"vertical_lanes\": 1,\n",
    "                         \"traffic_lights\": True}\n",
    "net_params = NetParams(additional_params=additional_net_params)\n",
    "\n",
    "network = TrafficLightGridNetwork(name=\"grid\",\n",
    "                            vehicles=VehicleParams(),\n",
    "                            net_params=net_params,\n",
    "                            initial_config=InitialConfig(),\n",
    "                            traffic_lights=tl_logic)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That's it! The traffic light logic will be passed into Flow's internals, which will generate an additional file containing all of the information needed to generate the traffic lights you specified in the simulation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Static Traffic Lights \n",
    "\n",
    "Static traffic lights are traffic lights with pre-defined phases. They cannot dynamically adjust according to the traffic needs; they simply follow the same pattern repeatedly. To see static traffic lights in action, the `TrafficLightParams` object should be instantiated with `baseline=False`. \n",
    "\n",
    "When adding individual traffic lights, the following parameters in addition to `node_id` are involved:\n",
    "\n",
    "* `tls_type`:  _[optional]_ str, specifies actuated or static traffic lights, defaults to static\n",
    "* `programID`:  _[optional]_ str, the program name for this traffic light. It cannot be the same ID as the base program, which is 0, defaults to 10\n",
    "* `offset`: _[optional]_ int, the initial time offset of the program\n",
    "\n",
    "An example of adding one static traffic light to our system is as follows:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tl_logic = TrafficLightParams(baseline=False)\n",
    "phases = [{\"duration\": \"31\", \"state\": \"GrGr\"},\n",
    "          {\"duration\": \"6\", \"state\": \"yryr\"},\n",
    "          {\"duration\": \"31\", \"state\": \"rGrG\"},\n",
    "          {\"duration\": \"6\", \"state\": \"ryry\"}]\n",
    "tl_logic.add(\"center0\", phases=phases, programID=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Actuated Traffic Lights"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For more flexibility than the static traffic lights defined above, and more control than RL-controlled traffic lights, actuated traffic lights are a good option to consider.\n",
    "\n",
    "To explain the actuated traffic lights, we refer to an excerpt from SUMO's documentation: \"SUMO supports gap-based actuated traffic control. This control scheme is common in Germany and works by prolonging traffic phases whenever a continuous stream of traffic is detected. It switches to the next phase after detecting a sufficent time gap between sucessive vehicles. This allows for better distribution of green-time among phases and also affects cycle duration in response to dynamic traffic conditions.\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The difference between phases for static and actuated traffic lights is that actuated traffic light has two additional parameters in `phases`, namely `minDur` and `maxDur`, which describe the allowed range of time durations for each phase. `minDur` is the minimum duration the phase will be held for, and `masDur` is the maximum duration the phase will be held for.\n",
    "\n",
    "In addition to these parameters of `phases` and all the required parameters of static of traffic lights, the following optional parameters are involved. The default values are set by SUMO: \n",
    "\n",
    "* `maxGap`: _[optional]_ int, describes the maximum time gap between successive vehicle sthat will cause the current phase to be prolonged\n",
    "* `detectorGap`: _[optional]_ int, determines the time distance between the (automatically generated) detector and the stop line in seconds (at each lane's maximum speed)\n",
    "* `showDetectors`: _[optional]_ bool, toggles whether or not detectors are shown in sumo-gui\n",
    "* `file`: _[optional]_ str, the file into which the detector shall write results\n",
    "* `freq`: _[optional]_ int, the period over which collected values shall be aggregated\n",
    "\n",
    "An example of adding two actuated traffic lights to our system is as follows. The first trafic lights corresponds to more custom control, while the second one specifies minimal control."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tl_logic = TrafficLightParams(baseline=False)\n",
    "phases = [{\"duration\": \"31\", \"minDur\": \"8\", \"maxDur\": \"45\", \"state\": \"GrGr\"},\n",
    "          {\"duration\": \"6\", \"minDur\": \"3\", \"maxDur\": \"6\", \"state\": \"yryr\"},\n",
    "          {\"duration\": \"31\", \"minDur\": \"8\", \"maxDur\": \"45\", \"state\": \"rGrG\"},\n",
    "          {\"duration\": \"6\", \"minDur\": \"3\", \"maxDur\": \"6\", \"state\": \"ryry\"}]\n",
    "\n",
    "tl_logic.add(\"center1\", \n",
    "             tls_type=\"actuated\", \n",
    "             programID=\"1\", \n",
    "             phases=phases, \n",
    "             maxGap=5.0, \n",
    "             detectorGap=0.9, \n",
    "             showDetectors=False)\n",
    "\n",
    "tl_logic.add(\"center2\",\n",
    "             tls_type=\"actuated\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Actuated Baseline Traffic Lights"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have developed an actuated traffic light \"baseline\" that can be used for any experiments on a grid. This baseline uses actuated traffic lights (section 4), and has been fine-tuned on many iterations of experiments with varying parameters. The actual parameters are located in the `TrafficLightParams` class under the getter function `actuated_default()`. For reference, these values are:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tl_type = \"actuated\"\n",
    "program_id = 1\n",
    "max_gap = 3.0\n",
    "detector_gap = 0.8\n",
    "show_detectors = True\n",
    "phases = [{\"duration\": \"31\", \"minDur\": \"8\", \"maxDur\": \"45\", \"state\": \"GrGr\"},\n",
    "        {\"duration\": \"6\", \"minDur\": \"3\", \"maxDur\": \"6\", \"state\": \"yryr\"},\n",
    "        {\"duration\": \"31\", \"minDur\": \"8\", \"maxDur\": \"45\", \"state\": \"rGrG\"},\n",
    "        {\"duration\": \"6\", \"minDur\": \"3\", \"maxDur\": \"6\", \"state\": \"ryry\"}]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To see the actuated baseline traffic lights in action, simply initialize the TrafficLightParams class with the `baseline` argument set to `True`, and pass it into the `additional_net_params`. Nothing else needs to be done; no traffic lights need to be added. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tl_logic = TrafficLightParams(baseline=True)\n",
    "additional_net_params = {\"grid_array\": grid_array, \n",
    "                         \"speed_limit\": 35,\n",
    "                         \"horizontal_lanes\": 1, \n",
    "                         \"vertical_lanes\": 1,\n",
    "                         \"traffic_lights\": True, \n",
    "                         \"tl_logic\": tl_logic}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Controlling Your Traffic Lights via RL\n",
    "\n",
    "This is where we switch from the non-RL experiment script to the RL experiment. \n",
    "\n",
    "To control traffic lights via RL, no `tl_logic` element is necessary. This is because the RL agent is controlling all the parameters you were able to customize in the prior sections. The `additional_net_params` should look something like this: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "additional_net_params = {\"speed_limit\": 35, \"grid_array\": grid_array,\n",
    "                         \"horizontal_lanes\": 1, \"vertical_lanes\": 1,\n",
    "                         \"traffic_lights\": True}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This will enable the program to recognize all nodes as traffic lights. The experiment then gives control to the environment; we are using `TrafficLightGridEnv`, which is an environment created for applying RL-specified traffic light actions (e.g. change the state) via TraCI.\n",
    "\n",
    "This is all you need to run an RL experiment! It is worth taking a look at the `TrafficLightGridEnv` class to further understanding of the experiment internals. The rest of this tutorial is an optional walkthrough through the various components of `TrafficLightGridEnv`:\n",
    "\n",
    "### Keeping Track of Traffic Light State\n",
    "\n",
    "\n",
    "Flow keeps track of the traffic light states (i.e. for each intersection, time elapsed since the last change, which direction traffic is flowing, and whether or not the traffic light is currently displaying yellow) in the following variables:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# keeps track of the last time the traffic lights in an intersection were allowed to change \n",
    "# (the last time the lights were allowed to change from a red-green state to a red-yellow state).\n",
    "self.last_change = np.zeros((self.rows * self.cols, 1))\n",
    "# keeps track of the direction of the intersection (the direction that is currently being allowed\n",
    "# to flow. 0 indicates flow from top to bottom, and 1 indicates flow from left to right.)\n",
    "self.direction = np.zeros((self.rows * self.cols, 1))\n",
    "# value of 1 indicates that the intersection is in a red-yellow state (traffic lights are red for \n",
    "# one way (e.g. north-south), while the traffic lights for the other way (e.g. west-east) are yellow.\n",
    "# 0 indicates that the intersection is in a red-green state.\n",
    "self.currently_yellow = np.zeros((self.rows * self.cols, 1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* The variable `self.last_change` indicates the last time the lights were allowed to change from a red-green state to a red-yellow state.\n",
    "* The variable `self.direction` indicates the direction of the intersection, i.e. the direction that is currently being allowed to flow. 0 indicates flow from top to bottom, and 1 indicates flow from left to right.\n",
    "* The variable `self.currently_yellow` with a value of 1 indicates that the traffic light is in a red-yellow state. 0 indicates that the traffic light is in a red-green state.\n",
    "\n",
    "`self.last_change` is contingent on an instance variable `self.min_switch_time`. This is a variable that can be set in `additional_env_params` with the key name `switch_time`. Setting `switch_time` enables more control over the RL experiment by preventing traffic lights from switching until `switch_time` timesteps have occurred. In practice, this can be used to prevent flickering."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "additional_env_params = {\"target_velocity\": 50, \"switch_time\": 3.0}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Elements of RL for Controlling Traffic Lights"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "#### Action Space"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The action space may be any set of actions the user wishes the agent to do. In this example, the action space for RL-controlled traffic lights directly matches the number of traffic intersections in the system. Each intersection (traffic light node) corresponds to an action. The action space is thus defined as:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@property\n",
    "def action_space(self):\n",
    "    if self.discrete:\n",
    "            return Discrete(2 ** self.num_traffic_lights)\n",
    "        else:\n",
    "            return Box(\n",
    "                low=0,\n",
    "                high=1,\n",
    "                shape=(self.num_traffic_lights,),\n",
    "                dtype=np.float32)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the case that the action space is discrete, we need 1-bit (that can be 0 or 1) for the action of each traffic light node. Hence, we need `self.num_traffic_lights` bits to represent the action space. To make a `self.num_traffic_lights`-bit number, we use the pyhton's `Discrete(range)`, and since we have `self.num_traffic_lights` bits, the `range` will be 2^`self.num_traffic_lights`.\n",
    "\n",
    "In the case that the action space is continuous, we use a range (that is currently (0,1)) of numbers for each traffic light node. Hence, we will define `self.num_traffic_lights` \"Boxes\", each in the range (0,1). \n",
    "\n",
    "Note that the variable `num_traffic_lights` is actually the number of intersections in the grid system, not the number of traffic lights. Number of traffic lights in our example is 4 times the number of intersections"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Observation Space"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The observation space may be any set of state information the user wishes to provide to the agent. This information may fully or partially describe the state of the environment. The existing observation space for this example is designed to be a fully observable state space with the following metrics. For all vehicle, we want to know its velocity, its distance (in [unit]) from the next intersection, and the unique edge it is traveling on. For each traffic light, we want to know its current state (i.e. what direction it is flowing), when it last changed, and whether it was yellow. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@property\n",
    "def observation_space(self):\n",
    "    speed = Box(\n",
    "            low=0,\n",
    "            high=1,\n",
    "            shape=(self.initial_vehicles.num_vehicles,),\n",
    "            dtype=np.float32)\n",
    "        dist_to_intersec = Box(\n",
    "            low=0.,\n",
    "            high=np.inf,\n",
    "            shape=(self.initial_vehicles.num_vehicles,),\n",
    "            dtype=np.float32)\n",
    "        edge_num = Box(\n",
    "            low=0.,\n",
    "            high=1,\n",
    "            shape=(self.initial_vehicles.num_vehicles,),\n",
    "            dtype=np.float32)\n",
    "        traffic_lights = Box(\n",
    "            low=0.,\n",
    "            high=1,\n",
    "            shape=(3 * self.rows * self.cols,),\n",
    "            dtype=np.float32)\n",
    "        return Tuple((speed, dist_to_intersec, edge_num, traffic_lights))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that in the case that the observation space is not fully-observable (e.g. cannot observe all the vehicles in the system), the observation space should be changed to only include those state information that are observable (e.g. velocity of N closest vehicles to an intersection)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### State Space"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The state space collects the information that the `observation_space` specifies. There are helper functions that exist in the `TrafficLightGridEnv` to construct the state space. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_state(self):\n",
    "    # compute the normalizers\n",
    "        grid_array = self.net_params.additional_params[\"grid_array\"]\n",
    "        max_dist = max(grid_array[\"short_length\"],\n",
    "                       grid_array[\"long_length\"],\n",
    "                       grid_array[\"inner_length\"])\n",
    "\n",
    "        # get the state arrays\n",
    "        speeds = [\n",
    "            self.k.vehicle.get_speed(veh_id) / self.k.network.max_speed()\n",
    "            for veh_id in self.k.vehicle.get_ids()\n",
    "        ]\n",
    "        dist_to_intersec = [\n",
    "            self.get_distance_to_intersection(veh_id) / max_dist\n",
    "            for veh_id in self.k.vehicle.get_ids()\n",
    "        ]\n",
    "        edges = [\n",
    "            self._convert_edge(self.k.vehicle.get_edge(veh_id)) /\n",
    "            (self.k.network.network.num_edges - 1)\n",
    "            for veh_id in self.k.vehicle.get_ids()\n",
    "        ]\n",
    "\n",
    "        state = [\n",
    "            speeds, dist_to_intersec, edges,\n",
    "            self.last_change.flatten().tolist(),\n",
    "            self.direction.flatten().tolist(),\n",
    "            self.currently_yellow.flatten().tolist()\n",
    "        ]\n",
    "        return np.array(state)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Reward"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The agents in an RL network will learn to maximize a certain reward. This objective can be defined in terms of maximizing rewards or minimizing the penalty. In this example, we penalize the large delay and boolean actions that indicate a switch (with the negative sign)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def compute_reward(self, rl_actions, **kwargs):\n",
    "        return - rewards.min_delay_unscaled(self) - rewards.boolean_action_penalty(rl_actions >= 0.5, gain=1.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Apply RL Actions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the `_apply_rl_actions` function, we specify what actions our agents should take in the environment. In this example, the agents (traffic light nodes) decide based on the action value how to change the traffic lights."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def _apply_rl_actions(self, rl_actions):\n",
    "        \"\"\"See class definition.\"\"\"\n",
    "        # check if the action space is discrete\n",
    "        if self.discrete:\n",
    "            # convert single value to list of 0's and 1's\n",
    "            rl_mask = [int(x) for x in list('{0:0b}'.format(rl_actions))]\n",
    "            rl_mask = [0] * (self.num_traffic_lights - len(rl_mask)) + rl_mask\n",
    "        else:\n",
    "            # convert values less than 0.5 to zero and above 0.5 to 1. 0 \n",
    "            # indicates that we should not switch the direction, and 1 indicates\n",
    "            # that switch should happen\n",
    "            rl_mask = rl_actions > 0.5\n",
    "\n",
    "        # Loop through the traffic light nodes    \n",
    "        for i, action in enumerate(rl_mask):\n",
    "            if self.currently_yellow[i] == 1:  # currently yellow\n",
    "                # Code to change from yellow to red\n",
    "                ...\n",
    "            else:\n",
    "                # Code to change to yellow\n",
    "                ..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These are the portions of the code that are hidden from the above code for shortening the code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "                # Code to change from yellow to red\n",
    "                self.last_change[i] += self.sim_step\n",
    "                # Check if our timer has exceeded the yellow phase, meaning it\n",
    "                # should switch to red\n",
    "                if self.last_change[i] >= self.min_switch_time:\n",
    "                    if self.direction[i] == 0:\n",
    "                        self.k.traffic_light.set_state(\n",
    "                            node_id='center{}'.format(i),\n",
    "                            state=\"GrGr\")\n",
    "                    else:\n",
    "                        self.k.traffic_light.set_state(\n",
    "                            node_id='center{}'.format(i),\n",
    "                            state='rGrG')\n",
    "                    self.currently_yellow[i] = 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "                    # Code to change to yellow\n",
    "                    if action:\n",
    "                    if self.direction[i] == 0:\n",
    "                        self.k.traffic_light.set_state(\n",
    "                            node_id='center{}'.format(i),\n",
    "                            state='yryr')\n",
    "                    else:\n",
    "                        self.k.traffic_light.set_state(\n",
    "                            node_id='center{}'.format(i),\n",
    "                            state='ryry')\n",
    "                    self.last_change[i] = 0.0\n",
    "                    self.direction[i] = not self.direction[i]\n",
    "                    self.currently_yellow[i] = 1"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  },
  "widgets": {
   "state": {},
   "version": "1.1.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
